A Knowledge Based Approach for Tackling Mislabeled Multi-class Big Social Data
نویسندگان
چکیده
The performance of classification models extremely relies on the quality of training data. However, label imperfection is an inherent fault of training data, which is impossible manually handled in big data environment. Various methods have been proposed to remove label noises in order to improve classification quality, with the side effect of cutting down data bulk. In this paper, we propose a knowledge based approach for tackling mislabeled multi-class big data, in which knowledge graph technique is combined with other data correction method to perceive and correct the error labels in big data. The knowledge graph is built with the medical concepts extracted from online health consulting and medical guidance. Experimental results show our knowledge graph based approach can effectively improve data quality and classification accuracy. Furthermore, this approach can be applied in other data mining tasks requiring deep understanding.
منابع مشابه
An algorithm for correcting mislabeled data
Reliable evaluation for the performance of classifiers depends on the quality of the data sets on which they are tested. During the collecting and recording of a data set, however, some noise may be introduced into the data, especially in various real-world environments, which can degrade the quality of the data set. In this paper, we present a novel approach, called ADE (automatic data enhance...
متن کاملA robust multi-class AdaBoost algorithm for mislabeled noisy data
AdaBoost has been theoretically and empirically proved to be a very successful ensemble learning algorithm, which iteratively generates a set of diverse weak learners and combines their outputs using the weighted majority voting rule as the final decision. However, in some cases, AdaBoost leads to overfitting especially for mislabeled noisy training examples, resulting in both its degraded gene...
متن کاملAn Approach towards Promoting Iranian Caregivers’ Knowledge on Early Childhood Development
Background: According to the World Health Organization (WHO), parents need to be informed about Early Childhood Development (ECD). Different methods of parents’ education include group-based, face-to-face, book, booklet, web-based, technology-based, and mobile learning using laptops, tablets, and cell phones. Paying attention to caregivers' attitudes is the first step to their education. The ob...
متن کاملA Fuzzy TOPSIS Approach for Big Data Analytics Platform Selection
Big data sizes are constantly increasing. Big data analytics is where advanced analytic techniques are applied on big data sets. Analytics based on large data samples reveals and leverages business change. The popularity of big data analytics platforms, which are often available as open-source, has not remained unnoticed by big companies. Google uses MapReduce for PageRank and inverted indexes....
متن کاملAutomatic Hashtag Recommendation in Social Networking and Microblogging Platforms Using a Knowledge-Intensive Content-based Approach
In social networking/microblogging environments, #tag is often used for categorizing messages and marking their key points. Also, since some social networks such as twitter apply restrictions on the number of characters in messages, #tags can serve as a useful tool for helping users express their messages. In this paper, a new knowledge-intensive content-based #tag recommendation system is intr...
متن کامل